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Abstract 



We propose a continuous model for evolutionary rate variation 
across sites and over the tree and derive exact transition probabilities 
under this model. Changes in rate are modelled using the CIR process, 
a diffusion widely used in financial applications. The model directly 
extends the standard gamma distributed rates across site model, with 
one additional parameter governing changes in rate down the tree. The 
parameters of the model can be estimated directly from two well-known 
statistics: the index of dispersion and the gamma shape parameter of 
the rates across sites model. The CIR model can be readily incor- 
porated into probabilistic models for sequence evolution. We provide 
here an exact formula for the likelihood of a three taxa tree. Larger 
trees can be evaluated using Monte-Carlo methods. 



Keywords Evolutionary rate; Molecular clocks; CIR process; Diffusion 
processes; Covarion; Phylogenetics. 
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1 Introduction 



Understanding evolutionary rates and how they vary is one of the central 
concerns of molecular evolution. It has been clearly shown that inadequate 
models of rate variation, between lineages and between loci, can dramatically 
affect the accuracy of phylogenetic inference The dependency of 

molecular dating on evolutionary rate models is even more critical: we will 
only obtain precise divergence time estimates from molecular data once we 
can model the rate at which sequences evolve (U |S] . 

Modelling the evolutionary rate is made difficult by the number and va- 
riety of factors influencing it. The base rate of mutation can vary because of 
changes in the accuracy of transcription machinery [5], DNA repair mech- 
anisms [7], and metabolic rate [S]. At the cellular level, selective pressures 
can lead to variation of rate between loci and over time, as evidenced by 
differential rates of the three codon position |3 EI] , the slower evolutionary 
rate of highly expressed genes ^Tj, and the effect of tertiary structure on 
patterns of sequence conservation 12 . 

Selection also affects the evolutionary rate at the level of populations. 
For the most part, the only mutations that affect phylogenetics are those 
that are fixed in the population. Hence evolutionary rate is a combination 
of mutation rate and fixation rate. Fluctuations in population size, gener- 
ation times, and environmental pressures affect fixation rates and thereby 
influence evolutionary rate EEH EE! • 

Because of this complexity, the strategies employed for modelling evolu- 
tionary rate have tended to be statistical in nature. As with all statistical 
inference, there is an iterative sequence of model formulation, model as- 
sessment, and model improvement. The aim is to construct a model that 
accurately explains the observed variation but is as simple, and tractable, 
as possible. 

Our goal in this paper is to derive a continuous model for rate evolu- 
tion that avoids many of the problems of existing approaches. We base 
our model on the CIR process, a continuous Markov process that is widely 
used in finance to model interest rates [EJ. As we shall see, the model fits 
well into existing protocols for phylogenetic inference. The process has a 
stationary distribution given by a gamma distribution and yet, unlike the 
rates-across-sites (RAS) model of Uzzell and Corbin |17j . the rate is allowed 
to vary along lineages. The CIR model adds only one parameter to the 
RAS model, and this parameter can be estimated directly from the index of 
dispersion or the autocorrelation (see below). Furthermore, we can derive 
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exact transition probabilities when we incorporate CIR based rate variation 
into the standard models for sequence evolution. 

The outline of the paper is as follows: 

• In the following section we summarise the key characteristics of models 
for rate evolution, and show how existing models are classified with 
respect to these characteristics. 

• In Section El we present the CIR model for rate evolution and discuss 
its basic properties. 

• In Section |1] we derive transition probabilities for standard mutation 
models where the rate is described as a Markov process. 

• In Section we focus on the case where the rate is modelled by a CIR 
process. 

• In Section H3 we extend this one step further to derive an expression 
for likelihood of a three-taxa tree using a mutation model with rate 
determined by the CIR process. We note that three-taxa trees are 
often used to study differences in evolutionary rate. 

We conclude with an outline of future work and work in progress. 

In a companion paper (in preparation) we describe the incorporation of 
this model into software for Bayesian phylogenetic inference, and use this to 
show how our model captures important information lost in standard RAS 
approaches. 

2 Properties of models for rate variation 

In this section we examine several important characteristics that can be used 
to distinguish, and choose between, different models for rate variation. We 
discuss how the different existing models fit into this scheme and summarise 
the differences between them in Tabled 

The rate of evolution for a given locus at time t > is denoted by Rt- 
For each t > 0, Rt is a non-negative random variable, and different models 
of rate evolution give different distributions for the rates Rt, t > 0. 

Here and throughout the paper we will restrict out attention to Markov 
processes. That is, for any t± < t<i < £3, we assume that R(t^) conditioned 
on R{t2) is independent of R{t\). In other words, the future depends on the 
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past only through the present. 

Property I: Continuous or Discontinuous Sample Paths 

The first characteristic is whether sample paths of the process are con- 
tinuous or discontinuous with respect to time. Typically, models with dis- 
continous paths have rates Rt that are constant except for discrete points 
in time at which there is a jump in the value (Figure [J- 1(a)). If the number 
of possible values for the rate is finite, then the rate can easily be described 
as a continuous-time Markov chain with a infinitesimal rate matrix. For ex- 
ample, in the covarion process ^H] the basic rates are 'off' (Rt = 0) or 'on' 
(Rt = 1) and transitions occur between them at exponentially distributed 
random time intervals. Galtier |19| I2f)j generalizes this process to one with 
more than two possible states. In other models, the range of possible val- 
ues for the rate is continuous, as in the model of Huelsenbeck [2^, where 
a rate change event consists of multiplying the previous rate by a gamma 
random variable. The rate change events are still discrete and exponentially 
distributed. 

There also are models that describes the rate as a continuous function 
with time, and the most important class of Markov processes with continuous 
paths are diffusions (Figure E-2(a)). Examples include the CIR process 
presented here, the Ornstein-Uhlenbeck model of Aris-Brosou and Yang [3J 
122) . and the log-normal model of Kishino et al. [51 123j . 

Finally, it is also possible for Rt to make jumps in value at a discrete set 
of times while also changing continuously in between these points. 

Property 11: Long Term Behaviour and Ergodicity 

The second property we consider is the distribution of Rt as t goes to 
infinity, that is, the distribution of the rate of evolution in the long term. 
Surprisingly, many models of rate evolution are very badly behaved in the 
limit. 

One problematic class of processes that have already been applied to 
rates in phylogenetics is the martingales. We say that a Markov process is 
a martingale if, for all s,t > we have E[Mt +s |Mj] = Mt |24j . An example 
of of a Markov martingale is Brownian motion. As a result of this fairly 
innocuous looking condition, a martingale Mt has the property that either 
E[|Mt|] is unbounded in time or Alt converges to a random constant |25) . 
Either possibility is undesirable from a modelling point of view. This may 
not be a problem if we only look at the process over a finite time, but 
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1(b) 2(b) 



Figure 1: A representation of the two classes of rate process with respect 
to the classification of property I. On top are examples of the rate history. 
Below are the corresponding integrated rates r(t) = f^ =Q R s ds. The figures 
l(a-b) refer to a continuous-time Markov chain with discrete rate change 
events, and in figures 2(a-b), R(t) is modelled as a diffusion process, with 
continuous paths. 



neither is it particularly desirable. The processes of Kishino et al. I23j 
and Huelsenbeck et al. all have the property that either Rt or log(i?t) 
is a martingale. 

At a purely theoretical level, we observe that an ever-increasing variance 
will result for almost any signal that is only driven by its initial value and a 
stochastic force, with no directional bias. The position of a particle subjected 
to a random force produced by collisions with other particles is a classical 
example of such a case. In our context, the effects on the evolutionary rate 
are not independent of the actual rate : whatever the theoretical framework 
we consider, a high evolutionary rate is not as likely to increase (or to stay 
in high values) as to go back to smaller values. The episodic evolution 
fits particularly well to this idea, where periods of drastic adaptation with 
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high evolutionary rates are naturally followed by periods where a population 
is adapted and its genome evolves much more slowly. Even according to 
the neutral theory, as argued by Takahata |14j . the overall dynamic of the 
rate should behave as a random function that takes high values whenever 
bottlenecks occur, and goes back to small values afterwards. 

The concept of ergodicity naturally arises from this observation. We say 
that a Markov process is ergodic if for any initial rate Rq the distribution 
of Rt converges to a unique distribution as t goes to infinity. The limiting 
distribution is known as the invariant or stationary distribution. Examples 
of ergodic processes include the OU process, the CIR model and (usually) 
the discrete space covarion and covarion-type models [26 | IT9 l [20] . 

One possible way for a process to not be ergodic is if the distribution of 
Rt does not converge for large t for some initial rate Rq. This must be the 
case if Rt is a martingale and does not converge to a constant, as is the case 
with Brownian motion. Another possibility is that Rt converges to different 
stationary distributions for different values of Rq. 

Property III: Tractability 

A highly desirable feature of any model is its tractability, both math- 
ematical (does there exist a closed formula?) and computational (can we 
compute probabilities efficiently?). Nowadays, Monte Carlo methods make 
it possible to use arbitrarily complex models: however, explicit analytical 
formulae allow for more efficient sampling |27| . 

There are several probability distribution functions that are important 
to have when working with rate processes. The most basic is the distribution 
of the rate Rt given the rate at time t = 0. This we have for the models 
H HI HS1 and for the CIR model, but not for the models of [H]. 

In phylogenetics we incorporate the model for evolutionary rate into the 
mutation model for sequence evolution at a site. These interact to give a 
joint process (Rt,Xt) for both the rate Rt at time t and the nucleotide or 
protein Xt at time t. To evaluate the likelihood we require an expression for 
the joint conditional probability 

P[X t =j,R t = s\X = i,R =r] (1) 

of going from one nucleotide (or amino acid) state and rate state to another 
pair of states. Even though it is sometimes possible to perform Monte 
Carlo computations to estimate this probability without a formula (as in 
|21|). having a formula will speed up the computations significantly without 
having to resort to approximations, as in \2'A\ 15] . 
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Property IV: Autocovariance and dispersion 



There is general agreement |28| . |29| . |3U| on the relevance of autocorre- 
lation in the modelling of evolutionary rate. Broadly speaking, if the various 
causes that explain rate variation (generation time, population size, environ- 
mental fitness) vary with time, it should be reflected in rate variations. The 
extent to which the rate varies can be studied using the index of dispersion 
(Kimura, [3^ [32]) Langley and Fitch jUS])- Let N(t) be the number of sub- 
stitutions or mutations of a sequence over time t. The index of dispersion 
I(t) is defined as 

_ Var[iV(t)] 
{t) E[N(t)] ■ { ) 

This statistic can be estimated by comparing the number of substitutions 
that have accumulated in different lineages The population genetics 

community has proposed different models to account for a high index of 
dispersion ([Tl], [SHI)) and any reasonable model should yield an index of 
dispersion of at least one. 

The index of dispersion resulting from a particular model of rate varia- 
tion is a function of the autocovariance of that model. The autocovariance 
for a process Rt is defined by 

p(t) = Cov(R ,Rt). (3) 

For many processes we can derive an explicit formula for the autocovariance. 
If we assume that the substitutions occur according to a Poisson process with 
rate governed by our rate process (that is, the substitutions follow a doubly 
stochastic or Cox process, Section^} and the rate process has autocovariance 
function p(t) then 

2 f* (1 - f) p(s)ds 

J ^ = 1+ e(A) ■ < 4 > 

as stated by a theorem in [HE], and the stationary index of dispersion ,37„ is 
then 

/(oo) = lim I(t) = 1 + 2 ^°° P(5)dS , (5) 

i^oo p 

provided that p, the stationary mean of the process R(t), and the limit, 
exist. Note that if there is any variation in rate then the index of dispersion 
will be greater than one |37j . 
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Some rate models in phylogenetics |23l \22\ don't model explicitly the 
rate, but instead assign a (fixed) rate to each branch, so that the expected 
number of substitutions on a particular branch is equal to its length times 
its assigned (constant) rate. 

A close look at the log-normal model from Thorne et al. [S| , which differs 
from their previous version [23] in that the rate is explicitly modelled, we 
suggest that the rate has constant autocovariance, since this rate process is 
close to a transform of the Brownian motion, and Brownian motion has a 
constant autocovariance function. Put into equation ©, we see that the the 
index of dispersion diverges. This problematic result illustrates the necessity 
of a balance between the presence of autocorrelation on one side, and the 
decrease of autocorrelation on a large time scale. 

Property V: Heterotachy or Homotachy 

There are two general ways that models for evolutionary rate can be 
incorporated into phylogenetics. On one hand, we have rate variation among 
lineages that applies to all sites (or loci) together. This can be modelled by 
trees for which the paths from the root to the leaves have different lengths. 
The rate variation explains the extent to which the evolution of the sequences 
has violated the molecular clock. Alternatively, we can introduce a distinct 
rate process for each site or locus. This models heterotachy, where the 
lineage rate changes are site-specific j2j. The transition probabilities that we 
derive in Section 0] can be applied to homotachous as well as heterotachous 
models. 

3 A Continuous diffusion model for the evolution- 
ary rate 

A Markov process with continuous paths and satisfying some additional 
smoothness conditions on its transition probabilities f3E, is called a diffu- 
sion. There are many ways of specifying a diffusion process: perhaps the 
most intuitive one is by giving the probability distribution function (pdf) 
of Rt given Rq = ro, for arbitrary ro-We denote this pdf by fii[Rt\ro]. For 
example, Brownian motion with parameter a 2 is defined by the condition 
that fn{Rt\ro] is a normal density with mean ro and variance a 2 t. 

A mathematically convenient representation of a diffusion is by means of 
a stochastic differential equation (SDE). In the same way that a dynamical 
system can be defined as the solution of a differential equation, a diffusion 
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Table 1: Models for the substitution rate, classified according to the prop- 
erties of section |2 . CTMC stands for "Continuous-Time Markov Chain" . 



process Rt can be defined as the solution of an equation taking the general 
form (see [21] P-61) 

dR t = a(t, Rt)dt + (3(t, R t )dB t . (6) 

Here, a(t, R t ) represents the deterministic effect on R t , (3(t, Rt) the stochas- 
tic part, and dBt is an infinitesimal "random" increment. Brownian motion 
corresponds to the case when a(t,Rt) = for all t, (3{t,Rt) is constant and 
the SDE becomes 

dR t = adB(t). 

Note that if /3(t,Rt) = for all t and Rt then (jEJ) becomes a deterministic 
ordinary differential equation. 

Going from an SDE such as © to a pdf for the diffusion involves solving 
a variable-coefficient second-order partial differential equation (PDE). For 
general functions a and (3 this PDE has no analytic solution. There are 
very few diffusions known that have closed form equations for their pdfs, 
and even fewer of these are ergodic. The simplest ergodic diffusions with 
closed-form expressions for the pdf are the Ornstein-Uhlenbeck and the CIR 
(Cox-Ingersoll-Ross) jT^] processes. 
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The Ornstein Uhlenbeck (OU) process is described by the SDE 



dR t = -bR t dt + adB t . 

The pdf for Rt given R$ = ro is the normal density with mean roe~ bt and 
variance a 2 (l — e~ 29t ). Its stationary distribution is normal with mean 
and variance a 2 . The OU process was used by Aris-Brosou and Yang 
\'2'2\ to model evolutionary rates. However, the OU process can take on 
negative values, and it is not clear how it can be used directly without 
any transformation, such as a reflected OU or a squared OU. Aris-Brosou 
and Yang also proposed another model, the EXP (for exponential) model, 
defined as the following : the rate assigned to a branch is drawn from an 
exponential distribution with mean equal to the rate of its ancestral branch. 
It is then obvious that their EXP model was a martingale. They outlined 
that the OU model seemed to provide a better fit to their data than the 
EXP model. Even if the reason of this better fit is still to be investigated, 
it seems reasonable to think that the ergodic property of the OU model 
could be a important factor. They also mentioned that the a 2 parameter 
of the OU model was hard to infer, perhaps because the OU model has an 
insufficient number of free parameters. 

The use of the CIR model solves the problem, since it is a generaliza- 
tion of the squared OU process, where the mean and the variance can be 
independently inferred by the addition of a third parameter. If the mean 
is fixed to one, we avoid any identifiability problem with branch lengths 
without fixing the variance, which can therefore be inferred as well as the 
autocorrelation. 

The CIR process satisfies the SDE 

dR t = b(a- R t )dt + av r R t dB t , (7) 

and the pdf fii(Rt\ro) for Rt given Ro = ro is a non-central \ 2 distribution 
with degree of freedom Aab/a 2 and parameter of non-centrality -bt\ - 
Its mean and variance are equal to 



E[r t ] = ro e- bt + a(l - e- bt ) (8) 
VarN = r ^(e- bt -e- 2bt ) + ^(l-e- bt ) 2 - (9) 

The stationary distribution of Rt is a gamma distribution with shape 
parameter 2ab/a 2 and scale parameter a 2 /2b. Hence the mean of the sta- 
tionary distribution is a and the variance is |16| . 
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Unlike an OU process, if ro, a, and b are all positive a CIR process is 
always non-negative. The square of an OU process is a special case of the 
CIR process. Furthermore, by multiplying Rt by a constant in equation (J7J), 
we see that multiplying a CIR process by a positive constant gives another 
CIR process. 

The covariance of the stationary CIR process can be exactly computed 

as 



From this, © leads to a closed formula for the index of dispersion: 



In Section [21 we emphasized that the concept of autocovariance is close 
to the index of dispersion. As Zheng showed [35], the effect of complex 
infinitesimal rate matrices on the index of dispersion (with constant rate) is 
not likely to explain alone the observed large empirical values. If the rate 
varies, Cutler [301 showed that an elevated index of dispersion can only be 
achieved if the time-scale of the rate process is approximately of the same 
magnitude as the substitution process itself. The CIR process provides the 
possibility to satisfy this property, while incorporating autocovariance. It is 
the consensus of these two ideas that should guide our choice for the rate of 
evolution. 

From ((7J) we see that the CIR process possesses three parameters, namely 
the stationary mean a, the stationary variance a 2 , and the intensity of the 
force that drives the process to its stationary distribution, b. The parameter 
b determines how fast the process autocovariance goes to as t increases. 

The three parameters of the CIR process can be quickly estimated from 
standard statistics in molecular evolution. The parameter a is a scale param- 
eter. It determines the expected rate at any time given no other information. 
Throughout the paper, we will assume that a = 1, so that the model has an 
expected rate equal to one. This parallels the constraint that the gamma 
distribution has an expected rate equal to one in the Rate-Across-Site (RAS) 
model [J. 

The CIR process has a stationary distribution given by a gamma dis- 
tribution. To make the stationary distribution coincide with the gamma 




(10) 



W*) = l + T5l(fc-l + e- W )- 



Thus 
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distribution of a RAS model with parameter T we choose a and b such that 



r-£ (12) 

The stationary index of dispersion, Icir(oo), can be estimated empirically 
[3*3*1 . We can then use (fTT|) and (f"""""|) to obtain the estimates 



a 2 



Icir{oo) - 1 
f 2 



Icir(oo) - 1 

4 Mutation models with a rate process 

The standard model for the substitution process at a particular locus is a 
continuous-time Markov chain. This kind of process is defined by a square 
matrix Q called the infinitesimal rate matrix. Suppose, to begin, that there 
is a constant evolutionary rate r^. As above, we let Xt denote the state (e.g. 
amino acid) at time t. The transition probabilities are then given by 

Pr[X t = j\X = i] = [e® r ° t ] ij . (13) 

We suppose that the process has a unique stationary distribution ir, where 
ttj is the stationary probability of state j and 

ttj = lim Pr[X t = j\X = i] 

t — >oo 

for all i and j. We assume that Q has been normalised so that in the 
stationary distribution the expected number of mutations over time t equals 
r^t. Note that the transition probabilities (|13j) depend only on the product 
r^t, so will be the same if we double the rate and halve the time, for example. 

Suppose now that the rate is not constant, but instead varies according 
to some fixed function r s , s > 0. Equation ([13*1) then becomes 

Pv[X t =j\X = i,r} = [e Q ^] ir (14) 

where 



r s ds 

=o 



is the area under the curve r,. 
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In the models we will consider, the fixed function r = (rt)t>o is replaced 
by a random process R = (Rt)t>o that is dependent only on the starting 
rate ro. The integral 

PS=t 

t r = / R s ds (15) 

Js=0 

is also random in this case; let g R denote its pdf. The transition probabilities 
can be determined from the expected value of (|14|) with r r replaced by the 
random variable t r . By the law of total expectation, this simplifies to 

Pr[X t = j\X = i} = j[e QT ], 3 g R {T)dT. (16) 

Let M(rj) = E T [e ,? ' rB ] denote the moment generating function (mgf) for 
the random variable t s r. Then (|16|) can be rewritten 

PT[X t =j\X = i] = [M(Q)] ij 

where the function M is interpreted as a matrix function |41j . We assume 
that Q can be diagonalised as Q = VAV~ X , where A = diag(Ai, . . . , A n ) is 
a diagonal matrix formed from the eigenvalues of Q. The matrix function 
M(Q) can then be evaluated as M(Q) = VM(A)V^ 1 , where 

M(A) = diag(M(Ai),...,M(A n )). 

See |121 for a more details on matrix functions. The problem of determining 
pattern probabilities therefore boils down to the problem of determining the 
moment generating function of the integrated rate, r R (eqn. 115(1 . Tuffley 
and Steel use this approach to derive distance estimates for the covarion 
process 

For applications in phylogenetics, we need the mgf of t r conditioned on 
just the starting rate, or both the starting and finishing rate. The mgf of 
tr, conditioned on a starting rate of ro, is then 

M ro (n) = E[exp07TR)|fIo = ro] 









E 


exp yrj J R s ds^j 


Ro = ro 









As before, we let fii(Rt\ro) denote the pdf of Rt conditioned on Rq = tq. 
Let 5(x) denote the Dirac delta function with 5(0) = 1 and 5(x) = for all 
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x ^ 0. The mgf of tr conditioned on both the starting and finishing rates is 



M ro,nO?) = E[exp(r?r fi )|i? = r , Rt = r t ] 
1 



fR(r t \r 
1 

/fi(n|r 



-E[exp(r/r R )<5(iit - r t )|i? = r ] 



-E 



expfrij R s ds\ 8(R t 



Ro = r 



.(18) 



Equations Q17|) and (|18jl hold irrespective of whether i? is discrete or con- 
tinuous, a diffusion, jump process, or a continuous time Markov chain. 

We note in passing that analytic formulae for M rQ {rf) and M rort (rj) exist 
in the case that R is a continuous time Markov chain, for example in the 
covarion-type model of Galtier |TJ]|. Suppose that the evolutionary rate 
switches between rate values gi, g2, ■ ■ ■ 9k following a continuous time Markov 
chain with infinitesimal rate matrix G. Let D be the k x k diagonal matrix 
with entries g\ , g<i , . . . , . A careful reworking of the proof of Theorem 1 in 
pB] gives the mgf of t r conditioned on both the starting and finishing rate. 
The mgf for tr conditioned on ro = g% is then 

while the mgf of tr conditioned on ro = gi and rt = gj is 



M, 



ll.) 



This provides an independent derivation of the formula in [20] for transition 
probabilities under a covarion-type model. 



5 Moment generating functions and transition prob- 
abilities for the CIR model 

In this section we derive expressions for the (joint) transition probabilities 
Pr[X t =j\X = i,R = r }. (19) 

and 

Pi[X t = j, R t = s\X = i,R = r ] (20) 
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As we have seen, to evaluate these probabilities we need to determine the 
moment generating functions (mgfs) defined in equations (|17|) and (|18|). 

We use the Feynman-Kac formula [441 I24| to derive analytic formulae 
for M rQ (rj) and M rt ^ Q (rj) under the CIR model. Let g(-) be a real- valued 
function. Define the function v = v(t,x) by 



v(t,x) = E 



exp ^r] J R(s)ds^j g(R t ) 



Ro = x 



(21) 



The Feynman-Kac formula [2] asserts that v(t,x) solves the following par- 
tial differential equation (PDE) 

did 2 

— v(t, x) = 6(1 - x)—v(t, x) + -a 2 x(t, x)-Q-^v(t, x) + r/xv (22) 

for t > 0, x G R, and with boundary condition 

v(0, x) = g(x) for all iGR, (23) 

We apply the methods in [13] and [IB] to solve these pdes with the different 
boundary conditions. 

First consider the case when we condition only on the initial rate, eqn. (|17|) . 
To make ()21|) equal to Q17JI we set g(x) = 1 for all x. The boundary condition 
(|23|) in this case becomes 

v(0,x) = 1 for all x G R.. 

With this boundary condition, the pde (|22|) has solution 

where 



_>/. 
77 



/ t e W/2 

*(r/,i) = = ^— = I (24) 

\bcosh(bt/2) + bsmh(bt/2) J 

*(r,t) = ( iv^Hbtm \ (25) 

^ ' V6cosh(6t/2) +6sinh(6t/2)/ ' v ' 

b = ^b 2 - 2 W 2 (26) 

We therefore have 

M ro ( V ) = y(v,t)e- ro3(ri ' t) - (27) 
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The case when both the starting and finished rates are specified is more 
complicated. From (|18|) the mgf M ro>rt can be written 

M r ,r t = , , , M t,x) 

fR{r t \r ) 

where, in this case, v(t,x) is given by (|21j) with g{x) = 5{x — rt). The 
boundary condition Q23JI therefore becomes 

v(0, x) = S(x — rt). 

With this new boundary condition, the pde 11221) has solution 

v(t,x) = cexp 



bt n . x b ~ b h + b / x -bt 

— t(o - b) H =-a; =-r$ - cm + xje 



x a -1/2 



xe bt 



b 



I» , 2c^xr t e- bt , (28) 



where 

2b 



a 2 (\-e 



-bt\ 



b = yjb 2 - 2r]a 2 

and I u (x) is the modified Bessel function of the first kind with parameter v 
|47| . Hence the mgf conditioned on initial and final rates is given by 



M r ,n(v) = cexp 



bt - b — b b + b . . _tf 

—^{b — o)-\ ^-ro y-r t - c(r t + r )e 



b 



In-, 2c\Jr r t e 



,r e- 6t / ^ V / fn{ r t\Ro = ro) 

where c and b are defined above and, from Section |3J fft(Rt\Ro = tq) is 
the pdf for a non-central \ 2 distribution with degree of freedom Aab/a 2 and 
parameter of non-centrality j^ffjz^psn ; 

Bringing everything together, we have our main result. 

Theorem 1 Define P by Pij = Pi[X t = j,Rt = r t \Xo = i,Rq = tq]. 
Suppose that Q = VAV~ l where A is a diagonal matrix containing the 
eigenvalues X±, A n ofQ. Then 

P = M(Q) = FM(A)y^ (29) 

where M(A) is the diagonal matrix where, for all i, 

Af(A)« = M rttro (Xi). 
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6 Three taxa phylogenies 



The simplest phylogeny for which we can distinguish between constant and 
variable evolutionary rates is a tree with three taxa. For this reason, there 
are many methods for testing, and estimating, rate variation that are based 
on three taxon analyses j^Hj- Here we show that the likelihood for a three 
taxa tree, under the CIR model of rate variation, can be computed exactly. 
The problem for general phylogenies is more complex since we have to in- 
tegrate out rates for the internal nodes. Here, we consider a heterotachous 
model, so that each site has its own rate history. Because the sites (and the 
rate at each site) evolve independently from each other, the likelihood of a 
sequence will be the product of all site-specific likelihoods. Therefore, we 
only require the likelihood computation for one site. 

We recall that the stationary distribution of the CIR is a gamma distri- 
bution T(u>,u), where lo = 2b /a 2 and v = 2b /a 2 , i.e. 

Mr) = ^r^e-^ . (30) 

Therefore the stationary mean and variance are 1 and a 2 /2b. 

In order to get the transition probabilities, we will use the mgf of t r 
unconditioned on the final rate, given by equation 1|27|). The transition 
probability matrix of the subsitution process, given initial rates, can be 
obtained by equations (jZZj) arid (|29j). Let Aj, . . . , X n be the eigenvalues of 
Q. Using eigenvalue decomposition, we can find vectors u^ 1 ), . . . ,u( n ) and 
vW, . . .,vW such that 

Pv[X t = i\X = j, R = r } = Y^ uf } v™M ro (\ k ,t), (31) 

k=l 

where we changed slightly our notation and explicitly wrote the dependency 
of M rQ on t. 

Now consider the 3-taxa tree with branches of lengths ti,t 2 ,t3 leading 
to leaves labelled with states x±,x 2 , £3 (Figure HJ). If we condition on a rate 
ro and state xq at the root then the probability of observing xi,X2,x^ at 
the leaves is given by 

L(x 1 ,x 2 ,x 3 \x ,r ) = P[X tl = xi\x ,r }P[Xt 2 = x 2 \x , r }P[X t3 =x 3 |x ,r ] 

n n n 

= J2J2J2 Bi ^ M ro(h,h)M ro (X j ,t 2 )M ro (X k ,h) (32) 
i=i j=i k=i 
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Figure 2: A three taxa unrooted tree, with branch lengths and one character 
state and rate value associated to each leaf. 



where 



-Dijk — U Xo v xi u x y x 2 u x y x 3 ■ 



The rate at the root is assumed to have the stationary distribution f Ro 
given by (j50J) . The likelihood integrated with respect to tq is then 

L(xi,x 2 ,x 3 \x ) = / L(x 1 ,x 2 ,x 3 \x ,r )f Ro (r Q )dr 

Jro 

which by equals 

n n n „ 

J2^2J2 B »k / M ro (X i ,t 1 )M ro (X j ,t 2 )M ro (X k ,t 3 )f Ro (r )dr . (33) 
i=l j=i k=i Jr ° 

We now use the formula Q27|) for the mgf's derived above. 

M ro ( Aj , t 1 ) M ro (Xj,t 2 ) M ro (X k ,t 3 )f Ro (r ) dr 
= ^(X^e-^^iX^t^e-^ 



V(X U ti)*(A i5 t 2 )*(A fc , t 3 )_^^-i e -'-o(-+3(A i ,t 1 )+3(A j ,t 2 )+3(A fc ,t3)) (34) 
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Using integration by parts, or simply using the fact that the gamma distri- 
bution integrates to 1, we get 

f M ro (Xi,ti)M ro ( Xj ,t 2 )M ro (X k ,t 3 )f Ro (r ) dr 

J ro 

= .(*,«•(*,.«»(*,,« ( , +5(Ai , tl)+ 4, t3)+5(At , t3) )"- 

Finally, we can substitute this back into (|33|) to obtain 

n n n 

L(x 1 ,x 2 ,x 3 \x ) = J2J2J2B i:jk ^(X u t 1 )^(X j ,t2)^(X k ,t 3 ) 
i=i j=i k=i 

x / u 

K + 3(Ai, ti) + E(Xj,t 3 ) + S(A fc) t 3 ) 

The formula extends immediately to phylogenies with n leaves attached 
to the root, though the number of terms in the summation increases ex- 
ponentially. Our approach has been to evaluate likelihoods on complete 
phylogenies using Monte-Carlo techniques, together with the exact transi- 
tion probabilities derived here. 



7 Discussion 

7.1 Summary 

We have shown how, given a few natural criteria for our model selection, 
the CIR appeared as the simplest continuous model that is at the same time 
ergodic, has a non-zero autocovariance function and that can account for an 
arbitrarily large index of dispersion. Moreover, we provided simple ways to 
estimate its parameters with the help of two observable statistics, namely 
the RAS gamma parameter and empirical index of dispersion. Another very 
interesting practical aspect of the CIR process is that it can be easily, and 
without approximations, implemented in the MCMC framework. 

7.2 Future extensions 

A possible future extension of our model could involve jump models, in which 
the rate path is discontinuous as in the continuous-time Markov chain, but 
also varies as diffusion between these discontinuities. However, the use of 
such a model implies the use of more parameters, and it may well be the 
case that the relative weakness of the rate of evolution signal cannot allow 
the use of more than two parameters, because of identifiability problems. 
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